{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    },
    "tags": [
     "remove-cell"
    ]
   },
   "source": [
    "# 去标识\n",
    "\n",
    "请点击[这里](https://github.com/uvm-plaid/programming-dp/raw/master/notebooks/adult_with_pii.csv)下载数据集，并将下载得到的数据集放置在与本章Jupyter笔记本相同的目录下。\n",
    "\n",
    "此数据集是根据人口普查数据修改得到的。数据集中的个人标识信息（Personal Identifiable Information，PII）经过了虚构处理。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from mplfonts.bin.cli import init\n",
    "init()\n",
    "from mplfonts import use_font\n",
    "use_font('SimHei')\n",
    "import matplotlib.pyplot as plt\n",
    "# plt.style.use('seaborn-whitegrid')\n",
    "plt.style.use('fivethirtyeight')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>DOB</th>\n",
       "      <th>SSN</th>\n",
       "      <th>Zip</th>\n",
       "      <th>Age</th>\n",
       "      <th>Workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>Education</th>\n",
       "      <th>Education-Num</th>\n",
       "      <th>Marital Status</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Relationship</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Capital Gain</th>\n",
       "      <th>Capital Loss</th>\n",
       "      <th>Hours per week</th>\n",
       "      <th>Country</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Karrie Trusslove</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>732-14-6110</td>\n",
       "      <td>64152</td>\n",
       "      <td>39</td>\n",
       "      <td>State-gov</td>\n",
       "      <td>77516</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>2174</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Brandise Tripony</td>\n",
       "      <td>6/7/1988</td>\n",
       "      <td>150-19-2766</td>\n",
       "      <td>61523</td>\n",
       "      <td>50</td>\n",
       "      <td>Self-emp-not-inc</td>\n",
       "      <td>83311</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Exec-managerial</td>\n",
       "      <td>Husband</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>13</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Brenn McNeely</td>\n",
       "      <td>8/6/1991</td>\n",
       "      <td>725-59-9860</td>\n",
       "      <td>95668</td>\n",
       "      <td>38</td>\n",
       "      <td>Private</td>\n",
       "      <td>215646</td>\n",
       "      <td>HS-grad</td>\n",
       "      <td>9</td>\n",
       "      <td>Divorced</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Dorry Poter</td>\n",
       "      <td>4/6/2009</td>\n",
       "      <td>659-57-4974</td>\n",
       "      <td>25503</td>\n",
       "      <td>53</td>\n",
       "      <td>Private</td>\n",
       "      <td>234721</td>\n",
       "      <td>11th</td>\n",
       "      <td>7</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Handlers-cleaners</td>\n",
       "      <td>Husband</td>\n",
       "      <td>Black</td>\n",
       "      <td>Male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Dick Honnan</td>\n",
       "      <td>9/16/1951</td>\n",
       "      <td>220-93-3811</td>\n",
       "      <td>75387</td>\n",
       "      <td>28</td>\n",
       "      <td>Private</td>\n",
       "      <td>338409</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Prof-specialty</td>\n",
       "      <td>Wife</td>\n",
       "      <td>Black</td>\n",
       "      <td>Female</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>Cuba</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Name        DOB          SSN    Zip  Age         Workclass  \\\n",
       "0  Karrie Trusslove   9/7/1967  732-14-6110  64152   39         State-gov   \n",
       "1  Brandise Tripony   6/7/1988  150-19-2766  61523   50  Self-emp-not-inc   \n",
       "2     Brenn McNeely   8/6/1991  725-59-9860  95668   38           Private   \n",
       "3       Dorry Poter   4/6/2009  659-57-4974  25503   53           Private   \n",
       "4       Dick Honnan  9/16/1951  220-93-3811  75387   28           Private   \n",
       "\n",
       "   fnlwgt  Education  Education-Num      Marital Status         Occupation  \\\n",
       "0   77516  Bachelors             13       Never-married       Adm-clerical   \n",
       "1   83311  Bachelors             13  Married-civ-spouse    Exec-managerial   \n",
       "2  215646    HS-grad              9            Divorced  Handlers-cleaners   \n",
       "3  234721       11th              7  Married-civ-spouse  Handlers-cleaners   \n",
       "4  338409  Bachelors             13  Married-civ-spouse     Prof-specialty   \n",
       "\n",
       "    Relationship   Race     Sex  Capital Gain  Capital Loss  Hours per week  \\\n",
       "0  Not-in-family  White    Male          2174             0              40   \n",
       "1        Husband  White    Male             0             0              13   \n",
       "2  Not-in-family  White    Male             0             0              40   \n",
       "3        Husband  Black    Male             0             0              40   \n",
       "4           Wife  Black  Female             0             0              40   \n",
       "\n",
       "         Country Target  \n",
       "0  United-States  <=50K  \n",
       "1  United-States  <=50K  \n",
       "2  United-States  <=50K  \n",
       "3  United-States  <=50K  \n",
       "4           Cuba  <=50K  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult = pd.read_csv(\"adult_with_pii.csv\")\n",
    "adult.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# 去标识"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "*去标识*（De-identification）是指从数据集中删除*标识信息*的过程。有的地方会把*去标识*这一术语与*匿名*（Anonymization）和*假名*（Pseudonymization）这两个术语看作同义词，表达相同的概念。\n",
    "\n",
    "```{admonition} 学习目标\n",
    "阅读本章后，您将能够：\n",
    "- 定义并理解下述概念：\n",
    "  - 去标识\n",
    "  - 重标识\n",
    "  - 标识信息 / 个人标识信息\n",
    "  - 关联攻击\n",
    "  - 聚合与聚合统计\n",
    "  - 差分攻击\n",
    "- 实施一次关联攻击\n",
    "- 实施一次差分攻击\n",
    "- 理解去标识技术的局限性\n",
    "- 理解聚合统计的局限性\n",
    "```\n",
    "\n",
    "我们尚不能严谨地定义什么是标识信息。我们通常将标识信息理解为：在日常生活中可以唯一标识我们自己的信息。从这个理解角度看，姓名、地址、电话号码、电子邮箱等都属于标识信息。我们稍后将会了解到，*不可能*为标识信息给出严谨的定义，因为*所有*信息都可以用来标识个体。一般来说，*个人标识信息*（Personally Identifiable Information，PII）和标识信息这两个术语是同义词，表达相同的概念。\n",
    "\n",
    "我们如何才能对信息去标识呢？很简单，我们直接移除包含标识信息的列就好了！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>DOB</th>\n",
       "      <th>Zip</th>\n",
       "      <th>Age</th>\n",
       "      <th>Workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>Education</th>\n",
       "      <th>Education-Num</th>\n",
       "      <th>Martial Status</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Relationship</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Capital Gain</th>\n",
       "      <th>Capital Loss</th>\n",
       "      <th>Hours per week</th>\n",
       "      <th>Country</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>64152</td>\n",
       "      <td>39</td>\n",
       "      <td>State-gov</td>\n",
       "      <td>77516</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>2174</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        DOB    Zip  Age  Workclass  fnlwgt  Education  Education-Num  \\\n",
       "0  9/7/1967  64152   39  State-gov   77516  Bachelors             13   \n",
       "\n",
       "  Martial Status    Occupation   Relationship   Race   Sex  Capital Gain  \\\n",
       "0  Never-married  Adm-clerical  Not-in-family  White  Male          2174   \n",
       "\n",
       "   Capital Loss  Hours per week        Country Target  \n",
       "0             0              40  United-States  <=50K  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult_data = adult.copy().drop(columns=['Name', 'SSN'])\n",
    "adult_pii = adult[['Name', 'SSN', 'DOB', 'Zip']]\n",
    "adult_data.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "我们将数据中一部分个体的标识信息保留了下来。我们随后将把这些保留的标识信息作为*辅助数据*（Auxiliary Data）来实施一次*重标识*（Re-identification）攻击。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 关联攻击\n",
    "\n",
    "假设我们想从刚刚得到的去标识数据中获取某个朋友的收入信息。去标识数据中的姓名一列已经被移除了，但我们碰巧知道能帮助我们标识出这位朋友的一些辅助信息。我们的这位朋友叫凯莉·特鲁斯洛夫（Karrie Trusslove），我们知道凯莉的出生日期和邮政编码。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "我们尝试攻击的数据集与我们知道的一些辅助信息之间包含一些重叠列，我们将应用这些重叠列来实施一次简单的*关联攻击*（Linkage Attack）。在本例中，两个数据集都包含出生日期和邮政编码列。我们在尝试攻击的数据集中查找出能与凯莉的出生日期和邮政编码匹配上的行。数据库领域将此类匹配操作称为*关联*（JOIN）两个数据表。我们可以使用Pandas的`merge`函数实现此操作。如果我们只能检索到唯一一行数据，我们就从尝试攻击的数据集中找到了凯莉所属的行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>SSN</th>\n",
       "      <th>DOB</th>\n",
       "      <th>Zip</th>\n",
       "      <th>Age</th>\n",
       "      <th>Workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>Education</th>\n",
       "      <th>Education-Num</th>\n",
       "      <th>Martial Status</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Relationship</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Capital Gain</th>\n",
       "      <th>Capital Loss</th>\n",
       "      <th>Hours per week</th>\n",
       "      <th>Country</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Karrie Trusslove</td>\n",
       "      <td>732-14-6110</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>64152</td>\n",
       "      <td>39</td>\n",
       "      <td>State-gov</td>\n",
       "      <td>77516</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>2174</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Name          SSN       DOB    Zip  Age  Workclass  fnlwgt  \\\n",
       "0  Karrie Trusslove  732-14-6110  9/7/1967  64152   39  State-gov   77516   \n",
       "\n",
       "   Education  Education-Num Martial Status    Occupation   Relationship  \\\n",
       "0  Bachelors             13  Never-married  Adm-clerical  Not-in-family   \n",
       "\n",
       "    Race   Sex  Capital Gain  Capital Loss  Hours per week        Country  \\\n",
       "0  White  Male          2174             0              40  United-States   \n",
       "\n",
       "  Target  \n",
       "0  <=50K  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "karries_row = adult_pii[adult_pii['Name'] == 'Karrie Trusslove']\n",
    "pd.merge(karries_row, adult_data, left_on=['DOB', 'Zip'], right_on=['DOB', 'Zip'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "我们确实只找到了一行匹配上的数据。通过使用辅助数据，我们在去标识数据集中重标识出了一个个体。我们可以根据重标识攻击结果进一步推断出凯莉的收入小于5万美元。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### 重标识出凯莉有多难？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "这是一个虚构的攻击场景，但在实际场景中实施关联攻击的难度也是出乎意料的低。有多低呢？事实证明，在绝大多数情况下，只需要一个数据点作为辅助信息就足以重标识出一行数据！"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>SSN</th>\n",
       "      <th>DOB_x</th>\n",
       "      <th>Zip</th>\n",
       "      <th>DOB_y</th>\n",
       "      <th>Age</th>\n",
       "      <th>Workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>Education</th>\n",
       "      <th>Education-Num</th>\n",
       "      <th>Martial Status</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Relationship</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Capital Gain</th>\n",
       "      <th>Capital Loss</th>\n",
       "      <th>Hours per week</th>\n",
       "      <th>Country</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Karrie Trusslove</td>\n",
       "      <td>732-14-6110</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>64152</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>39</td>\n",
       "      <td>State-gov</td>\n",
       "      <td>77516</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>2174</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Name          SSN     DOB_x    Zip     DOB_y  Age  Workclass  \\\n",
       "0  Karrie Trusslove  732-14-6110  9/7/1967  64152  9/7/1967   39  State-gov   \n",
       "\n",
       "   fnlwgt  Education  Education-Num Martial Status    Occupation  \\\n",
       "0   77516  Bachelors             13  Never-married  Adm-clerical   \n",
       "\n",
       "    Relationship   Race   Sex  Capital Gain  Capital Loss  Hours per week  \\\n",
       "0  Not-in-family  White  Male          2174             0              40   \n",
       "\n",
       "         Country Target  \n",
       "0  United-States  <=50K  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(karries_row, adult_data, left_on=['Zip'], right_on=['Zip'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "邮政编码*本身*就足以让我们重标识出凯莉了。那出生日期呢？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>SSN</th>\n",
       "      <th>DOB</th>\n",
       "      <th>Zip_x</th>\n",
       "      <th>Zip_y</th>\n",
       "      <th>Age</th>\n",
       "      <th>Workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>Education</th>\n",
       "      <th>Education-Num</th>\n",
       "      <th>Martial Status</th>\n",
       "      <th>Occupation</th>\n",
       "      <th>Relationship</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Capital Gain</th>\n",
       "      <th>Capital Loss</th>\n",
       "      <th>Hours per week</th>\n",
       "      <th>Country</th>\n",
       "      <th>Target</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Karrie Trusslove</td>\n",
       "      <td>732-14-6110</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>64152</td>\n",
       "      <td>64152</td>\n",
       "      <td>39</td>\n",
       "      <td>State-gov</td>\n",
       "      <td>77516</td>\n",
       "      <td>Bachelors</td>\n",
       "      <td>13</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Not-in-family</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>2174</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Karrie Trusslove</td>\n",
       "      <td>732-14-6110</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>64152</td>\n",
       "      <td>67306</td>\n",
       "      <td>64</td>\n",
       "      <td>Private</td>\n",
       "      <td>171373</td>\n",
       "      <td>11th</td>\n",
       "      <td>7</td>\n",
       "      <td>Widowed</td>\n",
       "      <td>Farming-fishing</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>White</td>\n",
       "      <td>Female</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Karrie Trusslove</td>\n",
       "      <td>732-14-6110</td>\n",
       "      <td>9/7/1967</td>\n",
       "      <td>64152</td>\n",
       "      <td>62254</td>\n",
       "      <td>46</td>\n",
       "      <td>Self-emp-not-inc</td>\n",
       "      <td>119944</td>\n",
       "      <td>Masters</td>\n",
       "      <td>14</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Exec-managerial</td>\n",
       "      <td>Husband</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>50</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&gt;50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Name          SSN       DOB  Zip_x  Zip_y  Age  \\\n",
       "0  Karrie Trusslove  732-14-6110  9/7/1967  64152  64152   39   \n",
       "1  Karrie Trusslove  732-14-6110  9/7/1967  64152  67306   64   \n",
       "2  Karrie Trusslove  732-14-6110  9/7/1967  64152  62254   46   \n",
       "\n",
       "          Workclass  fnlwgt  Education  Education-Num      Martial Status  \\\n",
       "0         State-gov   77516  Bachelors             13       Never-married   \n",
       "1           Private  171373       11th              7             Widowed   \n",
       "2  Self-emp-not-inc  119944    Masters             14  Married-civ-spouse   \n",
       "\n",
       "        Occupation   Relationship   Race     Sex  Capital Gain  Capital Loss  \\\n",
       "0     Adm-clerical  Not-in-family  White    Male          2174             0   \n",
       "1  Farming-fishing      Unmarried  White  Female             0             0   \n",
       "2  Exec-managerial        Husband  White    Male             0             0   \n",
       "\n",
       "   Hours per week        Country Target  \n",
       "0              40  United-States  <=50K  \n",
       "1              40  United-States  <=50K  \n",
       "2              50  United-States   >50K  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(karries_row, adult_data, left_on=['DOB'], right_on=['DOB'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "这一次返回了三行数据。我们不知道哪一行才是凯莉的数据。即便如此，我们仍然得到了很多信息！\n",
    "\n",
    "- 我们知道凯莉收入低于5万美元的概率是2/3。\n",
    "- 我们可以观察各行之前的差异，以确定哪些额外的辅助信息可以*帮助*我们进一步区分各行数据所属的个体。在本例中，性别、职业、婚姻状况都可以帮助我们进一步重标识出凯莉。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### 凯莉很特别吗？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "在数据集中重标识出其他某个个体的难度有多大？重标识出凯莉这一特定的个体相对更难还是相对更简单呢？衡量此类攻击有效性的一个好方法是查看特定数据是否有较好的\"筛选效果\"：特定数据能否帮助我们更好地缩小目标个体所属行的范围。举个例子，数据集中拥有相同出生日期的人数多吗？\n",
    "\n",
    "在执行攻击前，我们可以先评估一下出生日期这一辅助数据会给我们带来多大的帮助。为此，我们可以查看数据集中包含\"唯一\"出生日期的个体数量。下面的直方图显示，*绝大多数*出生日期在数据集中仅出现了1次、2次或3次，有8个个体的出生日期信息是缺失的。这意味着出生日期的\"筛选效果\"相当不错。出生日期可以有效缩小个体所属行的范围。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3de7xXdZ3v8dc7SUFRwEv7GFhQcWwqyssedbJHs5HCWyPWaNlxEn0wQ50cs6IzYpNhXoomzdImT0wyYWMi2QVCJ2OQXXkab6iJlxxQ0RACayO6FXW2fs4f67v1x+a391r7sn4XeD8fj99jr/Vdt/fe4v7stb5rfZciAjMzs768pt4BzMys8blYmJlZLhcLMzPL5WJhZma5XCzMzCzXsHoHKMO+++4b48ePH/D2zz77LHvsscfQBSpRM2WF5srrrOVpprzNlBUGl3flypV/jIj9qi6MiB3uc+ihh8ZgrFixYlDb11IzZY1orrzOWp5myttMWSMGlxe4M3r5verLUGZmlqvUYiHpM5Lul3SfpGslDZc0QdJtklZLuk7Srmnd3dL8mrR8fMV+zk3tD0k6uszMZma2vdKKhaSxwKeA1oh4B7ALcArwVeCyiJgIbAZmpE1mAJsj4i3AZWk9JL0tbfd24Bjg25J2KSu3mZltr+zLUMOAEZKGAbsDG4CjgOvT8gXAiWl6WponLZ8iSal9YUS8EBGPAmuAw0rObWZmFUorFhHxBHAJ8DhZkdgCrASeioiutNo6YGyaHgv8Pm3bldbfp7K9yjZmZlYDpd06K2kM2VnBBOAp4IfAsVVW7R7JUL0s66295/FmAjMBWlpaaG9v73/opLOzc1Db11IzZYXmyuus5WmmvM2UFcrLW+ZzFu8DHo2IJwEk/Rh4NzBa0rB09jAOWJ/WXwccAKxLl61GAR0V7d0qt3lFRMwD5gG0trZGW1vbgIO3t7czmO1rqZmyQnPlddbyNFPeZsoK5eUts8/iceAISbunvocpwAPACuCktM50YHGaXpLmSctvTvf9LgFOSXdLTQAmAreXmNvMzHoo7cwiIm6TdD1wF9AF3E32l/8NwEJJF6W2q9ImVwHfl7SG7IzilLSf+yUtIis0XcCZEfFSWbnNzGx7pQ73ERFzgDk9mh+hyt1MEfE8cHIv+7kYuHjIA/Zi1RNbOH32DbU63CvWzj2+5sc0MyvCT3CbmVkuFwszM8vlYmFmZrlcLMzMLJeLhZmZ5XKxMDOzXC4WZmaWy8XCzMxyuViYmVkuFwszM8vlYmFmZrlcLMzMLJeLhZmZ5XKxMDOzXC4WZmaWy8XCzMxyuViYmVkuFwszM8tVWrGQdKCkeyo+T0v6tKS9JS2TtDp9HZPWl6TLJa2RdK+kQyr2NT2tv1rS9LIym5lZdaUVi4h4KCIOioiDgEOB54CfALOB5RExEVie5gGOBSamz0zgSgBJe5O9x/twsnd3z+kuMGZmVhu1ugw1BXg4Ih4DpgELUvsC4MQ0PQ24OjK3AqMl7Q8cDSyLiI6I2AwsA46pUW4zMwMUEeUfRJoP3BUR35L0VESMrli2OSLGSFoKzI2IW1L7cuAcoA0YHhEXpfbzgK0RcUmPY8wkOyOhpaXl0IULFw4476aOLWzcOuDNB2zS2FH93qazs5ORI0eWkKYczZTXWcvTTHmbKSsMLu/kyZNXRkRrtWXDBpWqAEm7AicA5+atWqUt+mjftiFiHjAPoLW1Ndra2voXtMIV1yzm0lWl/2i2s/bUtn5v097ezmC+11prprzOWp5myttMWaG8vLW4DHUs2VnFxjS/MV1eIn3dlNrXAQdUbDcOWN9Hu5mZ1UgtisVHgWsr5pcA3Xc0TQcWV7Sflu6KOgLYEhEbgJuAqZLGpI7tqanNzMxqpNRrLZJ2B94PfLyieS6wSNIM4HHg5NR+I3AcsIbszqkzACKiQ9KFwB1pvQsioqPM3GZmtq1Si0VEPAfs06PtT2R3R/VcN4Aze9nPfGB+GRnNzCyfn+A2M7NcLhZmZpbLxcLMzHK5WJiZWS4XCzMzy+ViYWZmuVwszMwsl4uFmZnlcrEwM7NcLhZmZpbLxcLMzHK5WJiZWS4XCzMzy5VbLCSdLGnPNP0FST+WdEj50czMrFEUObM4LyKekfQe4GhgAXBlubHMzKyRFCkWL6WvxwNXRsRiYNfyIpmZWaMpUiyekPQd4MPAjZJ2K7idmZntIIr80v8w2Tuvj4mIp4C9gf9TZOeSRku6XtLvJD0o6S8k7S1pmaTV6euYtK4kXS5pjaR7K/tFJE1P66+WNL33I5qZWRlyi0V6Neom4D2pqQtYXXD/3wR+HhFvBd4FPAjMBpZHxERgeZoHOBaYmD4zSf0ikvYG5gCHA4cBc7oLjJmZ1UaRu6HmAOcA56am1wL/VmC7vYD3AlcBRMSL6cxkGlknOenriWl6GnB1ZG4FRkvan6xTfVlEdETEZmAZcEzB78/MzIaAIqLvFaR7gIOBuyLi4NR2b0S8M2e7g4B5wANkZxUrgbOBJyJidMV6myNijKSlwNyIuCW1LycrUm3A8Ii4KLWfB2yNiEt6HG8m2RkJLS0thy5cuLDYT6CKTR1b2Lh1wJsP2KSxo/q9TWdnJyNHjiwhTTmaKa+zlqeZ8jZTVhhc3smTJ6+MiNZqy4YV2P7FiAhJASBpj4LHHQYcApwVEbdJ+iavXnKqRlXaoo/2bRsi5pEVJ1pbW6Otra1gzO1dcc1iLl1V5EcztNae2tbvbdrb2xnM91przZTXWcvTTHmbKSuUl7dIB/eidDfUaEl/B/wH8C8FtlsHrIuI29L89WTFY2O6vET6uqli/QMqth8HrO+j3czMaqRIB/clZL/ofwQcCHwxIq4osN0fgN9LOjA1TSG7JLUE6L6jaTqwOE0vAU5Ld0UdAWyJiA1kd2JNlTQmdWxPTW1mZlYjuddaJE0Afh0Ry9L8CEnjI2Jtgf2fBVwjaVfgEeAMsgK1SNIM4HHg5LTujcBxwBrgubQuEdEh6ULgjrTeBRHRUfD7MzOzIVDkwvwPgXdXzL+U2v48b8OIuAeo1lkypcq6AZzZy37mA/MLZDUzsxIU6bMYFhEvds+kaQ/3YWa2EylSLJ6UdEL3jKRpwB/Li2RmZo2myGWoT5D1O3yL7DbW3wOnlZrKzMwaSm6xiIiHgSMkjSR7iO+Z8mOZmVkjKXI31G7AXwPjgWFS9oxcRFxQajIzM2sYRS5DLQa2kA3X8UK5cczMrBEVKRbjIsID95mZ7cSK3A31G0mTSk9iZmYNq8iZxXuA0yU9SnYZSmTP0PU56qz13/jZN/R7m1mTujh9ANtVWjv3+EFtb2Y7viLF4tjSU5iZWUMrMpDgY2Sjvh6Vpp8rsp2Zme04SntTnpmZ7TiKnCF8EDgBeBYgItYDe5YZyszMGkuRYvFiGhG2v2/KMzOzHUSZb8ozM7MdRJ93Qykb2+M64K3A07z6prxlNchmZmYNos9iEREh6acRcSjgAmFmtpMqchnqVkm5b8UzM7MdV5FiMRn4T0kPS7pX0ipJ9xbZuaS1af17JN2Z2vaWtEzS6vR1TGqXpMslrUnHOaRiP9PT+qslTR/IN2pmZgNXpM/iE8BjgzjG5IiofLPebGB5RMyVNDvNn0P2pPjE9DkcuBI4XNLewByyd3kHsFLSkojYPIhMZmbWD32eWaRbZi+LiMd6fgZxzGnAgjS9ADixov3qyNxKdvfV/sDRwLKI6EgFYhngUXDNzGpIWT3oYwXpn4HvRcQd/d55NvjgZrIzgu9ExDxJT0XE6Ip1NkfEGElLgbkRcUtqX052xtEGDI+Ii1L7ecDWiLikx7FmAjMBWlpaDl24cGF/475iU8cWNm4d8OY11TKCQWedNHbU0IQpoLOzk5EjR9bseIPhrOVpprzNlBUGl3fy5MkrI6K12rIiAwlOBj4u6TGyp7j7M+rskRGxXtLrgGWSftfHuqrSFn20b9sQMQ+YB9Da2hptbW0F4lV3xTWLuXRVkR9N/c2a1DXorGtPbRuaMAW0t7czmP82teSs5WmmvM2UFcrLW+qos2loECJik6SfAIcBGyXtHxEb0mWmTWn1dWQDFnYbB6xP7W092tsHmsnMzPqvyN1Q0cunT5L2kLRn9zQwFbgPWAJ039E0ney1raT209JdUUcAWyJiA3ATMFXSmHTn1NTUZmZmNVLkzOIGXr0cNByYADwEvD1nuxbgJ9kNVQwDfhARP5d0B9kQIjOAx4GT0/o3AscBa8iGQT8DICI6JF0IdPeZXBARHcW+PTMzGwq5xSIitnmlanr+4eMFtnsEeFeV9j8BU6q0B3BmL/uaD8zPO6aZmZWj3y8xioi7AD/RbWa2E8k9s5D02YrZ1wCHAE+WlsjMzBpOkT6LyhcddZH1YfyonDhmZtaIivRZfKkWQczMrHEVeQf3MkmVT1yPkeRbV83MdiJFOrj3i4inumfS+EyvKy+SmZk1miLF4iVJb+iekfRGCjyUZ2ZmO44iHdz/CNwi6Zdp/r2kAfvMzGznUKSD++fpQbwjyJ7i/kyP91OYmdkOrkgH9weB/46IpRHxM6BL0ol525mZ2Y6jSJ/FnIjY0j2TOrvnlBfJzMwaTZFiUW2d5njZg5mZDYkixeJOSV+X9GZJb5J0GbCy7GBmZtY4ihSLs4AXgeuAHwLP08vosGZmtmMqcjfUs5IuAi6MiGdrkMnMzBpMn2cWkj4p6XHgMeBxSY9J+mRtopmZWaPotVhI+gLwAaAtIvaJiH2AycCxaZmZme0k+jqz+BjwofTGO+CVt999GDit6AEk7SLpbklL0/wESbdJWi3pOkm7pvbd0vyatHx8xT7OTe0PSTq6f9+imZkNVp+XoSLi+SptW4GX+3GMs4EHK+a/ClwWEROBzcCM1D4D2BwRbwEuS+sh6W3AKWTv/D4G+LakXfpxfDMzG6S+isU6Sdu9K1vSUcCGIjuXNA44HvhumhdwFHB9WmUB0P00+LQ0T1o+Ja0/DVgYES9ExKPAGuCwIsc3M7Oh0dfdUJ8CFku6hey5iiB79/aRZL/Ai/gG8A+8+ra9fYCnIqIrza8DxqbpscDvASKiS9KWtP5Y4NaKfVZu8wpJM0kDHLa0tNDe3l4w4vZaRsCsSV35KzaAocg6mJ9Vf3V2dtb0eIPhrOVpprzNlBXKy9trsYiI+yW9A/hfZJeABPwK+Hi1y1M9SfoAsCkiVkpq626udqicZX1tU5l3HjAPoLW1Ndra2nquUtgV1yzm0lXN8ZD6rEldg8669tS2oQlTQHt7O4P5b1NLzlqeZsrbTFmhvLx9/pZJRWH+APd9JHCCpOOA4cBeZGcaoyUNS2cX44D1af11wAFkl7+GAaOAjor2bpXbmJlZDRR5gntAIuLciBgXEePJOqhvjohTgRXASWm16cDiNL0kzZOW3xwRkdpPSXdLTQAmAreXldvMzLZXj2st5wAL01PhdwNXpfargO9LWkN2RnEKvHI5bBHwANAFnBkRL9U+tpnZzqvXYiFpeURMkfTViDhnMAeJiHagPU0/QpW7mdIlr5N72f5i4OLBZDAzs4Hr68xif0l/SdbvsJAeHc0RcVepyczMrGH0VSy+CMwm61D+eo9lQfa8hJmZ7QT6unX2euB6SedFxIU1zGRmZg2myBDlF0o6AXhvamqPiKXlxjIzs0aSe+uspK+Qje/0QPqcndrMzGwnUeTW2eOBgyLiZQBJC8hueT23zGBmZtY4ij6UN7pielQZQczMrHEVObP4CnC3pBVkt8++F59VmJntVIp0cF8rqZ1sxFkB50TEH8oOZmZmjaPQcB8RsYFsjCYzM9sJlTaQoJmZ7ThcLMzMLFefxULSayTdV6swZmbWmPosFunZit9KekON8piZWQMq0sG9P3C/pNuBZ7sbI+KE0lKZmVlDKVIsvlR6CjMza2hFnrP4paQ3AhMj4j8k7Q7sUn40MzNrFEUGEvw74HrgO6lpLPDTAtsNl3S7pN9Kul/Sl1L7BEm3SVot6TpJu6b23dL8mrR8fMW+zk3tD0k6uv/fppmZDUaRW2fPBI4EngaIiNXA6wps9wJwVES8CzgIOEbSEcBXgcsiYiKwGZiR1p8BbI6ItwCXpfWQ9Day93G/HTgG+LYkn9mYmdVQkWLxQkS82D0jaRjZm/L6FJnONPva9Ol+w971qX0BcGKanpbmScunSFJqXxgRL0TEo8AaqrzD28zMylOkWPxS0ueBEZLeD/wQ+FmRnUvaRdI9wCZgGfAw8FREdKVV1pFd1iJ9/T1AWr4F2Keyvco2ZmZWA0XuhppNdoloFfBx4Ebgu0V2HhEvAQdJGg38BPizaqulr+plWW/t25A0E5gJ0NLSQnt7e5GIVbWMgFmTuvJXbABDkXUwP6v+6uzsrOnxBsNZy9NMeZspK5SXt8jdUC+nFx7dRvZL+qGIyL0M1WMfT6WRa48ARksals4exgHr02rrgAOAdelS1yigo6K9W+U2lceYB8wDaG1tjba2tv5E3MYV1yzm0lWFxlisu1mTugadde2pbUMTpoD29nYG89+mlpy1PM2Ut5myQnl5i9wNdTzZ5aPLgW8BayQdW2C7/dIZBZJGAO8DHgRWACel1aYDi9P0kjRPWn5zKkpLgFPS3VITgInA7cW+PTMzGwpF/iS9FJgcEWsAJL0ZuAH495zt9gcWpDuXXgMsioilkh4AFkq6iOz1rFel9a8Cvi9pDdkZxSkAEXG/pEVk7//uAs5Ml7fMzKxGihSLTd2FInmErMO6TxFxL3BwlfZHqHI3U0Q8D5zcy74uBi4ukNXMzErQa7GQ9KE0eb+kG4FFZH0WJwN31CCbmZk1iL7OLP6qYnoj8Jdp+klgTGmJzMys4fRaLCLijFoGMTOzxpXbZ5HuQDoLGF+5vocoNzPbeRTp4P4p2Z1KPwNeLjeO1cP42TfU7FizJnVxesXx1s49vmbHNrOBK1Isno+Iy0tPYmZmDatIsfimpDnAL8hGkgUgIu4qLZWZmTWUIsViEvAxstFiuy9DdY8ea2ZmO4EixeKDwJsqhyk3M7OdS5Ehyn8LjC47iJmZNa4iZxYtwO8k3cG2fRa+ddbMbCdRpFjMKT2FmZk1tCLvs/hlLYKYmVnjKvIE9zO8+ma6Xcnepf1sROxVZjAzM2scRc4s9qycl3QiVYYYNzOzHVeRu6G2ERE/xc9YmJntVIpchvpQxexrgFZevSxlZmY7gSJ3Q1W+16ILWAtMKyWNmZk1pCJ9FgN6r4WkA4Crgf9BNkzIvIj4pqS9gevIhjxfC3w4IjZLEvBN4DjgOeD07vGnJE0HvpB2fVFELBhIJjMzG5i+Xqv6xT62i4i4MGffXcCsiLhL0p7ASknLgNOB5RExV9JsYDZwDnAsMDF9DgeuBA5PxWUOr17+WilpSURsLvQdmpnZoPXVwf1slQ/ADLJf7n2KiA3dZwYR8QzwIDCW7BJW95nBAuDEND0NuDoytwKjJe0PHA0si4iOVCCWAccU/xbNzGywFJHfV53ODM4mKxSLgEsjYlPhg0jjgV8B7wAej4jRFcs2R8QYSUuBuRFxS2pfTlaU2oDhEXFRaj8P2BoRl/Q4xkxgJkBLS8uhCxcuLBpvO5s6trBx64A3r6mWETRNVtg+76Sxo+oXJkdnZycjR46sd4xCmikrNFfeZsoKg8s7efLklRHRWm1Zn30W6RLQZ4FTyc4CDunv5R9JI4EfAZ+OiKezronqq1Zpiz7at22ImAfMA2htbY22trb+xNzGFdcs5tJVRfr+62/WpK6myQrb5117alv9wuRob29nMP+OaqmZskJz5W2mrFBe3l4vQ0n6GnAH8AwwKSLOH0CheC1ZobgmIn6cmjemy0ukr91nKOuAAyo2Hwes76PdzMxqpK8+i1nA68nuQlov6en0eUbS03k7Tnc3XQU8GBFfr1i0BJiepqcDiyvaT1PmCGBLRGwAbgKmShojaQwwNbWZmVmN9Hr9IiL6/XR3D0eSvWFvlaR7UtvngbnAIkkzgMeBk9OyG8lum11DduvsGSlHh6QLyc5yAC6IiI5BZjMzs34o7WJ36qjurYNiSpX1Azizl33NB+YPXTozM+uPwZ49mJnZTsDFwszMcrlYmJlZLhcLMzPL5WJhZma5XCzMzCyXi4WZmeVysTAzs1wuFmZmlsvFwszMcrlYmJlZLhcLMzPL5WJhZma5XCzMzCyXi4WZmeVysTAzs1wuFmZmlqu0YiFpvqRNku6raNtb0jJJq9PXMaldki6XtEbSvZIOqdhmelp/taTp1Y5lZmblKvPM4nvAMT3aZgPLI2IisDzNAxwLTEyfmcCVkBUXYA5wOHAYMKe7wJiZWe2UViwi4ldAR4/macCCNL0AOLGi/erI3AqMlrQ/cDSwLCI6ImIzsIztC5CZmZVMEVHezqXxwNKIeEeafyoiRlcs3xwRYyQtBeZGxC2pfTlwDtAGDI+Ii1L7ecDWiLikyrFmkp2V0NLScujChQsHnHtTxxY2bh3w5jXVMoKmyQrb5500dlT9wuTo7Oxk5MiR9Y5RSDNlhebK20xZYXB5J0+evDIiWqstGzaoVENHVdqij/btGyPmAfMAWltbo62tbcBhrrhmMZeuapQfTd9mTepqmqywfd61p7bVJcf42TfkrjNr0ktcesuzQ37stXOPH/J9tre3M5h/87XWTHmbKSuUl7fWd0NtTJeXSF83pfZ1wAEV640D1vfRbmZmNVTrYrEE6L6jaTqwuKL9tHRX1BHAlojYANwETJU0JnVsT01tZmZWQ6Vdv5B0LVmfw76S1pHd1TQXWCRpBvA4cHJa/UbgOGAN8BxwBkBEdEi6ELgjrXdBRPTsNDczs5KVViwi4qO9LJpSZd0AzuxlP/OB+UMYzczM+slPcJuZWS4XCzMzy+ViYWZmuVwszMwsl4uFmZnlcrEwM7NcLhZmZpbLxcLMzHK5WJiZWS4XCzMzy+ViYWZmuVwszMwsl4uFmZnlcrEwM7NczfM+TrMdRJFXuvbXrEldnJ6z3zJe52o7D59ZmJlZLhcLMzPL5WJhZma5mqZYSDpG0kOS1kiaXe88ZmY7k6bo4Ja0C/DPwPuBdcAdkpZExAP1TWZmRfTs1C/SIT8U3Kk/dJqiWACHAWsi4hEASQuBaYCLhZn1aijuPBtIYdsRi5Qiot4Zckk6CTgmIv42zX8MODwi/r5inZnAzDR7IPDQIA65L/DHQWxfS82UFZorr7OWp5nyNlNWGFzeN0bEftUWNMuZhaq0bVPlImIeMG9IDibdGRGtQ7GvsjVTVmiuvM5anmbK20xZoby8zdLBvQ44oGJ+HLC+TlnMzHY6zVIs7gAmSpogaVfgFGBJnTOZme00muIyVER0Sfp74CZgF2B+RNxf4iGH5HJWjTRTVmiuvM5anmbK20xZoaS8TdHBbWZm9dUsl6HMzKyOXCzMzCyXi0Uiab6kTZLuq3eWIiQdIGmFpAcl3S/p7Hpn6o2k4ZJul/TblPVL9c6UR9Iuku6WtLTeWfJIWitplaR7JN1Z7zx5JI2WdL2k36V/v39R70zVSDow/Uy7P09L+nS9c/VG0mfS/1/3SbpW0vAh3b/7LDKS3gt0AldHxDvqnSePpP2B/SPiLkl7AiuBExtxCBRJAvaIiE5JrwVuAc6OiFvrHK1Xkj4LtAJ7RcQH6p2nL5LWAq0R0RQPjklaAPw6Ir6b7m7cPSKeqneuvqQhh54gexj4sXrn6UnSWLL/r94WEVslLQJujIjvDdUxfGaRRMSvgI565ygqIjZExF1p+hngQWBsfVNVF5nONPva9GnYv1IkjQOOB75b7yw7Gkl7Ae8FrgKIiBcbvVAkU4CHG7FQVBgGjJA0DNidIX4WzcViByBpPHAwcFt9k/QuXda5B9gELIuIhs0KfAP4B+DlegcpKIBfSFqZhr1pZG8CngT+NV3m+66kPeodqoBTgGvrHaI3EfEEcAnwOLAB2BIRvxjKY7hYNDlJI4EfAZ+OiKfrnac3EfFSRBxE9vT9YZIa8lKfpA8AmyJiZb2z9MOREXEIcCxwZrqk2qiGAYcAV0bEwcCzQEO/ciBdKjsB+GG9s/RG0hiywVUnAK8H9pD0N0N5DBeLJpau//8IuCYiflzvPEWkSw7twDF1jtKbI4ETUj/AQuAoSf9W30h9i4j16esm4CdkozQ3qnXAuoozy+vJikcjOxa4KyI21jtIH94HPBoRT0bEfwM/Bt49lAdwsWhSqdP4KuDBiPh6vfP0RdJ+kkan6RFk/7B/V99U1UXEuRExLiLGk116uDkihvQvtKEkaY90gwPpcs5UoGHv6IuIPwC/l3RgappC479q4KM08CWo5HHgCEm7p98NU8j6MYeMi0Ui6VrgP4EDJa2TNKPemXIcCXyM7C/f7lv7jqt3qF7sD6yQdC/ZOF/LIqLhb0ltEi3ALZJ+C9wO3BARP69zpjxnAdekfw8HAV+uc55eSdqd7KVrDX3mns7UrgfuAlaR/W4f0mE/fOusmZnl8pmFmZnlcrEwM7NcLhZmZpbLxcLMzHK5WJiZWS4XC6sbSSHp0or5z0k6f4j2/T1JJw3FvnKOc3IaOXVFj/bxkramIS0eTKPuTi+wv4PKvgW6QOZ70gjBv+l+HkJSq6TLi2SWdL6kz5X5PVjtuVhYPb0AfEjSvvUOUimNMFrUDOCTETG5yrKHI+LgiPgzsgf8PiPpjJz9HQSU/bxMXuaDIuJdwALg8wARcWdEfKrnymnQulpktjpzsbB66iJ7cOgzPRf0PDOQ1Jm+tkn6paRFkv5L0lxJp6a/3FdJenPFbt4n6ddpvQ+k7XeR9DVJd0i6V9LHK/a7QtIPyB5q6pnno2n/90n6amr7IvAe4P9K+lpf32hEPAJ8FvhU2vaw9Jf73d1/wacxiC4APpL+uv9IekJ7fsp7t6Rpafu3p+/5nvR9TBzqzMBewOaKn8/SNH2+pHmSfgFc3TNz2vZtktolPSJpuyJjTSgi/PGnLh+y94fsBawFRgGfA85Py74HnFS5bvraBjxF9lT4bmTvGPhSWnY28I2K7X9O9gfRRLIxiYYDM4EvpHV2A+4kG3ytjWxQuwlVcr6ebDiF/cgGwruZ7N0hkI1z1Vplm9bSHWsAAAKZSURBVPHAfT3aRgNb0/RewLA0/T7gR2n6dOBbFdt8Gfibiu3/C9gDuAI4NbXvCowYosxbgXuAh8lGL31Dxc99aZo+n+z9KSN6yXw+8Jv0890X+BPw2nr/e/NncJ9hmNVRRDwt6Wqyv7i3FtzsjojYACDpYaB7KOZVQOWllUUR8TKwWtIjwFvJxk56Z8VZyyiyYvIicHtEPFrleH8OtEfEk+mY15C9k+GnBfN2U8X0KGBBOiMIsnd8VDOVbGDD7j6A4cAbyIam+Udl7974cUSsHqLMD0c2OjDpLGEe1Qd9XBIRff33uiEiXgBekLSJbFiSdTnHtgbmy1DWCL5Bdh298r0GXaR/n2lgtF0rlr1QMf1yxfzLsM0fQD3HsgmyX9hnRXZd/qCImBCvjvv/bC/51Et7fx3Mq4O7XQisiOytjH9FVgR6O/ZfV+R9Q0Q8GBE/IBs2eytwk6SjSsi8hKzAVNPbz6pb5X+jl8B/mDY7Fwuru4joABaRFYxua4FD0/Q0ev/Luy8nS3pN6sd4E/AQcBPwv5UN746k/6n8l+/cBvylpH1T5/dHgV/2J4iyF1RdQnb5CLIziyfS9OkVqz4D7FkxfxNwViqYSDo4fX0T8EhEXE72S/2dQ52ZrG/j4QLr9cxsOyAXC2sUl5Jd3+72L2S/7G4HDif/L9lqHiL7BfnvwCci4nmyV6U+ANwl6T7gO+T81ZsueZ0LrAB+S/Zug8UFjv/m7ltnyYrhFRHxr2nZPwFfkfT/gMq7r1aQdQ53dxZfSFYo7015L0zrfQS4T9nbB99K1tE8VJnvUTaK7ZeBvy2wTc/MtgPyqLNmZpbLZxZmZpbLxcLMzHK5WJiZWS4XCzMzy+ViYWZmuVwszMwsl4uFmZnl+v/hMH7iOBqh3gAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "adult_pii['DOB'].value_counts() .hist()\n",
    "plt.xlabel('生日数量')\n",
    "plt.ylabel('出现次数');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "我们可以利用相同的方法衡量邮政编码的筛选效果。这次的结果变得更夸张了：邮政编码在此数据集中的筛选效果*非常*好。几乎所有的邮政编码在此数据集中都只出现了一次。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAEGCAYAAACkQqisAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAbs0lEQVR4nO3df7xVdZ3v8dfbnzlqgpJnSEzUyzQ5Wg4Sch/6sINOitoVM3X0YYKOE1ZoNjn3io1J6jRjU9SM1mhUFNxIJDMhpYyMo+PcVPyViOaAhkZwRUNR0DTkM3+s75HtZp991l6ctffZnPfz8diPvdZ3fddan73g7M9e3+9a36WIwMzMrIjtWh2AmZm1LycRMzMrzEnEzMwKcxIxM7PCnETMzKywHVodQLMNGTIkhg8fXmjdDRs2sOuuu/ZtQH3AcTXGcTXGcTVmW43rgQceeD4i3rHFgogYUK/DDjssilq0aFHhdcvkuBrjuBrjuBqzrcYF3B81vlPdnGVmZoU5iZiZWWFOImZmVpiTiJmZFeYkYmZmhTmJmJlZYU4iZmZWmJOImZkV5iRiZmaFDbhhT7bGkt+t45wptzV9vyuuPrHp+zQzy8NnImZmVpiTiJmZFeYkYmZmhTmJmJlZYU4iZmZWmJOImZkV5iRiZmaFOYmYmVlhTiJmZlaYk4iZmRXmJGJmZoU5iZiZWWFOImZmVpiTiJmZFeYkYmZmhTmJmJlZYU4iZmZWmJOImZkV5iRiZmaFOYmYmVlhTiJmZlaYk4iZmRXmJGJmZoU5iZiZWWFOImZmVpiTiJmZFVZaEpG0r6RFkh6XtFTSRal8T0kLJS1L74NTuSRdI2m5pEckjazY1sRUf5mkiRXlh0lakta5RpLK+jxmZralMs9ENgIXR8R7gDHAZEkHAVOAOyJiBHBHmgc4HhiRXpOA6yBLOsBU4HBgNDC1O/GkOpMq1htX4ucxM7MqpSWRiFgdEQ+m6ZeBx4F9gPHAzFRtJnBymh4PzIrMPcAgSUOB44CFEbE2Il4AFgLj0rK3R8QvIyKAWRXbMjOzJmhKn4ik4cBfAvcCHRGxGrJEA+ydqu0D/LZitZWprF75yhrlZmbWJDuUvQNJuwE/BD4dES/V6baotSAKlNeKYRJZsxcdHR10dXX1EnVtHbvAxYdsLLTu1ugt3vXr1xf+TGVyXI1xXI1xXI0pK65Sk4ikHckSyOyIuDkVPytpaESsTk1Sa1L5SmDfitWHAatSeWdVeVcqH1aj/hYiYjowHWDUqFHR2dlZq1qvrp09j2lLSs+7W1hxVmfd5V1dXRT9TGVyXI1xXI1xXI0pK64yr84S8G3g8Yj4SsWi+UD3FVYTgXkV5RPSVVpjgHWpuet24FhJg1OH+rHA7WnZy5LGpH1NqNiWmZk1QZk/q48AzgaWSHo4lX0WuBqYK+k84BngtLRsAXACsBx4BTgXICLWSroKWJzqXRkRa9P0J4DvArsAP0kvMzNrktKSSETcTe1+C4BjatQPYHIP25oBzKhRfj9w8FaEaWZmW8F3rJuZWWFOImZmVpiTiJmZFeYkYmZmhTmJmJlZYb0mEUmnSdo9TV8m6ebKEXbNzGzgynMm8rmIeFnSkWSDIc4kjbBrZmYDW54k8kZ6PxG4LiLmATuVF5KZmbWLPEnkd5K+AZwOLJC0c871zMxsG5cnGZxONn7VuIh4EdgT+N+lRmVmZm2h1yQSEa+QjbR7ZCraCCwrMygzM2sPea7OmgpcAlyainYEvldmUGZm1h7yNGd9GDgJ2AAQEauA3csMyszM2kOeJPJ6GmE3ACTtWm5IZmbWLvIkkbnp6qxBkj4G/Bz4ZrlhmZlZO+j1eSIR8WVJHwReAt4NXB4RC0uPzMzM+r1ek4ik/YH/6E4cknaRNDwiVpQdnJmZ9W95mrN+AGyqmH8jlZmZ2QCXJ4nsEBGvd8+kaQ97YmZmuZLIc5JO6p6RNB54vryQzMysXfTaJwJ8HJgt6WuAgN8CE0qNyszM2kKeq7OeBMZI2g1QRLxcflhmZtYO8lydtTPwEWA4sIMkACLiylIjMzOzfi9Pc9Y8YB3wAPBaueGYmVk7yZNEhkXEuNIjMTOztpPn6qz/J+mQ0iMxM7O2k+dM5EjgHEm/IWvOEhAR8d5SIzMzs34vTxI5vvQozMysLeV5suHTwL7A0Wn6lTzrmZnZts9PNjQzs8L8ZEMzMyvMTzY0M7PC/GRDMzMrrO7VWcrGOLkR+HP8ZEMzM6tSN4lEREi6JSIOA5w4zMzsLfI0Z90j6f2lR2JmZm0nTxIZC/xS0pOSHpG0RNIjva0kaYakNZIerSj7vKTfSXo4vU6oWHappOWSnpB0XEX5uFS2XNKUivL9Jd0raZmkGyX5aYtmZk2Wp0/k48DTBbb9XeBrwKyq8q9GxJer9nMQcAbwF8A7gZ9L+rO0+OvAB4GVwGJJ8yPiMeCLaVtzJF0PnAdcVyBOMzMrqO6ZSLq096sR8XT1q7cNR8RdwNqccYwH5kTEaxHxG2A5MDq9lkfEU+nZ7nOA8Sm5HQ3clNafCZycc19mZtZH8oyddY+k90fE4j7a5wWSJgD3AxdHxAvAPsA9FXVWpjLIHsdbWX44sBfwYkRsrFF/C5ImAZMAOjo66OrqKhR4xy5w8SEbe6/Yx3qLd/369YU/U5kcV2McV2McV2PKiitPEhkLnC/pabK71rdmFN/rgKvIbly8CpgG/E3aZrWg9plS1KlfU0RMB6YDjBo1Kjo7OxsKutu1s+cxbUmeQ9a3VpzVWXd5V1cXRT9TmRxXYxxXYxxXY8qKq6mj+EbEs93Tkr4J3JpmV5IN8thtGLAqTdcqf57s5scd0tlIZX0zM2uSPFdnRQ+vhkkaWjH7YaD7yq35wBmSdpa0PzACuA9YDIxIV2LtRNb5Pj/11SwCTk3rTyR7jK+ZmTVRnjOR29jchPQ2YH/gCbIrqXok6QagExgiaSUwFeiUdGja3grgfICIWCppLvAYsBGYHBFvpO1cANwObA/MiIilaReXAHMk/SPwEPDtfB/ZzMz6Sq9JJCLe8mhcSSNJX/69rHdmjeIev+gj4gvAF2qULwAW1Ch/iuzqLTMza5GGHy4VEQ8CvoPdzMx6PxOR9JmK2e2AkcBzpUVkZmZtI0+fSOUDqDaS9ZH8sJxwzMysneTpE7miGYGYmVn7yfOM9YWSBlXMD5Z0e7lhmZlZO8jTsf6OiHixeyYNU7J3eSGZmVm7yJNE3pD0ru4ZSftR8GZDMzPbtuTpWP8H4G5Jd6b5o0iDGZqZ2cCWp2P9p+kGwzFkd63/XUQ8X3pkZmbW7+XpWP8w8MeIuDUifgxslORnd5iZWa4+kakRsa57JnWyTy0vJDMzaxd5kkitOs1/qIaZmfU7eZLI/ZK+IulASQdI+irwQNmBmZlZ/5cniVwIvA7cCPwA+AMwucygzMysPeS5OmtDembHVRGxoQkxmZlZm6h7JiLpk5KeAZ4GnpH0tKRPNic0MzPr73pMIpIuAz4EdEbEXhGxFzAWOD4tMzOzAa7emcjZwCnpCYLAm08TPB2YUHZgZmbW/9VtzoqIP9QoexXYVFpEZmbWNuolkZWSjqkulHQ0sLq8kMzMrF3UuzrrU8A8SXeT3RcSZM9WPwIY34TYzMysn+vxTCQilgIHA3cBw4ED0vTBaZmZmQ1wde8TSX0iM5oUi5mZtZk8d6ybmZnV5CRiZmaF1bvZ8I70/sXmhWNmZu2kXp/IUEkfAE6SNIfsqYZviogHS43MzMz6vXpJ5HJgCjAM+ErVsgCOLisoMzNrDz0mkYi4CbhJ0uci4qomxmRmZm0iz1DwV0k6CTgqFXVFxK3lhmVmZu2g16uzJP0zcBHwWHpdlMrMzGyAy/Os9BOBQyNiE4CkmcBDwKVlBmZmZv1f3vtEBlVM71FGIGZm1n7ynIn8M/CQpEVkl/kehc9CzMyMfB3rN0jqIhvBV8AlEfH/yw7MzMz6v1zNWRGxOiLmR8S8vAlE0gxJayQ9WlG2p6SFkpal98GpXJKukbRc0iOSRlasMzHVXyZpYkX5YZKWpHWukSTMzKypyhw767vAuKqyKcAdETECuCPNAxwPjEivScB1kCUdYCpwODAamNqdeFKdSRXrVe/LzMxKVloSiYi7gLVVxeOBmWl6JnByRfmsyNwDDJI0FDgOWBgRayPiBWAhMC4te3tE/DIiAphVsS0zM2uSun0ikrYDHomIg/tofx0RsRqyJjJJe6fyfYDfVtRbmcrqla+sUV6TpElkZy10dHTQ1dVVLPhd4OJDNhZad2v0Fu/69esLf6YyOa7GOK7GOK7GlBVXbw+l2iTpV5LeFRHP9PneN6vVnxEFymuKiOnAdIBRo0ZFZ2dngRDh2tnzmLYkzwVtfWvFWZ11l3d1dVH0M5XJcTXGcTXGcTWmrLjyfCMOBZZKug/Y0F0YEScV2N+zkoams5ChwJpUvhLYt6LeMGBVKu+sKu9K5cNq1DczsybKk0Su6MP9zQcmAlen93kV5RekIecPB9alRHM78E8VnenHApdGxFpJL0saA9wLTACu7cM4zcwshzz3idwpaT9gRET8XNKfANv3tp6kG8jOIoZIWkl2ldXVwFxJ5wHPAKel6guAE4DlwCvAuWnfayVdBSxO9a6MiO7O+k+QXQG2C/CT9DIzsybqNYlI+hhZp/SewIFkHdjXA8fUWy8izuxh0RbrpSusJvewnRnAjBrl9wN91eFvZmYF5LnEdzJwBPASQEQsA/auu4aZmQ0IeZLIaxHxeveMpB2ocyWUmZkNHHmSyJ2SPgvsIumDwA+AH5cblpmZtYM8SWQK8BywBDifrBP8sjKDMjOz9pDn6qxN6UFU95I1Yz2ROsLNzGyAy3N11olkV2M9SXan+P6Szo8IX1JrZjbA5bnZcBowNiKWA0g6ELgN35dhZjbg5ekTWdOdQJKn2DxciZmZDWA9nolIOiVNLpW0AJhL1idyGpvvIDczswGsXnPW/6qYfhb4QJp+Dhi8ZXUzMxtoekwiEXFuMwMxM7P2k+fqrP2BC4HhlfULDgVvZmbbkDxXZ90CfJvsLvVN5YZjZmbtJE8S+UNEXFN6JGZm1nbyJJF/kzQV+BnwWndhRDxYWlRmZtYW8iSRQ4CzgaPZ3JwVad7MzAawPEnkw8ABlcPBm5mZQb471n8FDCo7EDMzaz95zkQ6gF9LWsxb+0R8ia+Z2QCXJ4lMLT0KMzNrS3meJ3JnMwIxM7P2k+eO9ZfZ/Ez1nYAdgQ0R8fYyAzMzs/4vz5nI7pXzkk4GRpcWkZmZtY08V2e9RUTcgu8RMTMz8jVnnVIxux0wis3NW2ZmNoDluTqr8rkiG4EVwPhSojEzs7aSp0/EzxUxM7Oa6j0e9/I660VEXFVCPGZm1kbqnYlsqFG2K3AesBfgJGJmNsDVezzutO5pSbsDFwHnAnOAaT2tZ2ZmA0fdPhFJewKfAc4CZgIjI+KFZgRmZmb9X70+kS8BpwDTgUMiYn3TojIzs7ZQ72bDi4F3ApcBqyS9lF4vS3qpOeGZmVl/Vq9PpOG72c3MbGBxojAzs8JakkQkrZC0RNLDku5PZXtKWihpWXofnMol6RpJyyU9ImlkxXYmpvrLJE1sxWcxMxvIWnkmMjYiDo2IUWl+CnBHRIwA7kjzAMcDI9JrEnAdvHnl2FTgcLJRhad2Jx4zM2uO/tScNZ7sMmLS+8kV5bMicw8wSNJQ4DhgYUSsTZcdLwTGNTtoM7OBTBHNH5BX0m+AF8hGA/5GREyX9GJEDKqo80JEDJZ0K3B1RNydyu8ALgE6gbdFxD+m8s8Br0bEl2vsbxLZWQwdHR2HzZkzp1Dca9au49lXC626VQ7ZZ4+6y9evX89uu+3WpGjyc1yNcVyNcVyN2dq4xo4d+0BFy9Gb8oziW4YjImKVpL2BhZJ+XaeuapRFnfItCyOmk93vwqhRo6Kzs7PBcDPXzp7HtCXNP2Qrzuqsu7yrq4uin6lMjqsxjqsxjqsxZcXVkuasiFiV3tcAPyLr03g2NVOR3tek6iuBfStWHwasqlNuZmZN0vQkImnXNBYXknYFjgUeBeYD3VdYTQTmpen5wIR0ldYYYF1ErAZuB46VNDh1qB+byszMrEla0ZzVAfxIUvf+vx8RP5W0GJgr6TzgGeC0VH8BcAKwHHiFbBBIImKtpKuAxanelRGxtnkfw8zMmp5EIuIp4H01yn8PHFOjPIDJPWxrBjCjr2M0M7N8+tMlvmZm1macRMzMrDAnETMzK8xJxMzMCmvVzYbWJoZPua3wuhcfspFzCq6/4uoTC+/XzJrHZyJmZlaYk4iZmRXmJGJmZoU5iZiZWWFOImZmVpiTiJmZFeYkYmZmhTmJmJlZYU4iZmZWmJOImZkV5iRiZmaFOYmYmVlhTiJmZlaYk4iZmRXmJGJmZoU5iZiZWWFOImZmVpiTiJmZFeYkYmZmhTmJmJlZYU4iZmZWmJOImZkV5iRiZmaFOYmYmVlhTiJmZlaYk4iZmRXmJGJmZoU5iZiZWWFOImZmVtgOrQ7AzDJLfreOc6bc1pJ9r7j6xJbs19pf25+JSBon6QlJyyVNaXU8ZmYDSVsnEUnbA18HjgcOAs6UdFBrozIzGzjavTlrNLA8Ip4CkDQHGA881tKozCyX4VvRfHfxIRsLN/+5+a7vKCJaHUNhkk4FxkXE36b5s4HDI+KCqnqTgElp9t3AEwV3OQR4vuC6ZXJcjXFcjXFcjdlW49ovIt5RXdjuZyKqUbZFVoyI6cD0rd6ZdH9EjNra7fQ1x9UYx9UYx9WYgRZXW/eJACuBfSvmhwGrWhSLmdmA0+5JZDEwQtL+knYCzgDmtzgmM7MBo62bsyJio6QLgNuB7YEZEbG0xF1udZNYSRxXYxxXYxxXYwZUXG3dsW5mZq3V7s1ZZmbWQk4iZmZWmJNIFUkzJK2R9GgPyyXpmjTMyiOSRvaTuDolrZP0cHpd3qS49pW0SNLjkpZKuqhGnaYfs5xxNf2YSXqbpPsk/SrFdUWNOjtLujEdr3slDe8ncZ0j6bmK4/W3ZcdVse/tJT0k6dYay5p+vHLG1ZLjJWmFpCVpn/fXWN63f48R4VfFCzgKGAk82sPyE4CfkN2jMga4t5/E1Qnc2oLjNRQYmaZ3B/4LOKjVxyxnXE0/ZukY7JamdwTuBcZU1fkkcH2aPgO4sZ/EdQ7wtWb/H0v7/gzw/Vr/Xq04XjnjasnxAlYAQ+os79O/R5+JVImIu4C1daqMB2ZF5h5gkKSh/SCuloiI1RHxYJp+GXgc2KeqWtOPWc64mi4dg/Vpdsf0qr66ZTwwM03fBBwjqdaNtc2OqyUkDQNOBL7VQ5WmH6+ccfVXffr36CTSuH2A31bMr6QffDkl/zM1R/xE0l80e+epGeEvyX7FVmrpMasTF7TgmKUmkIeBNcDCiOjxeEXERmAdsFc/iAvgI6kJ5CZJ+9ZYXoZ/Bf4PsKmH5S05XjnigtYcrwB+JukBZUM+VevTv0cnkcblGmqlBR4kG9vmfcC1wC3N3Lmk3YAfAp+OiJeqF9dYpSnHrJe4WnLMIuKNiDiUbISF0ZIOrqrSkuOVI64fA8Mj4r3Az9n86780kj4ErImIB+pVq1FW6vHKGVfTj1dyRESMJBvdfLKko6qW9+nxchJpXL8caiUiXupujoiIBcCOkoY0Y9+SdiT7op4dETfXqNKSY9ZbXK08ZmmfLwJdwLiqRW8eL0k7AHvQxKbMnuKKiN9HxGtp9pvAYU0I5wjgJEkrgDnA0ZK+V1WnFcer17hadLyIiFXpfQ3wI7LRziv16d+jk0jj5gMT0hUOY4B1EbG61UFJ+tPudmBJo8n+bX/fhP0K+DbweER8pYdqTT9meeJqxTGT9A5Jg9L0LsBfAb+uqjYfmJimTwV+EalHtJVxVbWbn0TWz1SqiLg0IoZFxHCyTvNfRMRHq6o1/XjliasVx0vSrpJ2754GjgWqr+js07/Hth72pAySbiC7ameIpJXAVLJORiLiemAB2dUNy4FXgHP7SVynAp+QtBF4FTij7D+k5AjgbGBJak8H+CzwrorYWnHM8sTVimM2FJip7IFq2wFzI+JWSVcC90fEfLLk938lLSf7RX1GyTHljetTkk4CNqa4zmlCXDX1g+OVJ65WHK8O4Efpt9EOwPcj4qeSPg7l/D162BMzMyvMzVlmZlaYk4iZmRXmJGJmZoU5iZiZWWFOImZmVpiTiLUtSSFpWsX830v6fB9t+7uSTu2LbfWyn9OUjTS8qKp8sjaP/vqwpEfT532PstGHb031KkeKfUzSx3rYz2hJd0l6QtKvJX1L0p80EOeKZt6Iae3DScTa2WvAKf3tyy3da5HXecAnI2JsZWFEfD0iDu1+kd0gNjsiat2wdmOq0wn8k6SOqng6gB8Al0TEu4H3AD8lG93YbKs4iVg720j23Oi/q15QfSYhaX1675R0p6S5kv5L0tWSzlL2LI0lkg6s2MxfSfqPVO9Daf3tJX1J0mJlA+udX7HdRZK+DyypEc+ZafuPSvpiKrscOBK4XtKXevqQysY+Op1syPMepWEungT2q1o0GZgZEb9M9SIiboqIZyXtKemW9FnukfTetM+9JP1M2bMyvkHFeEuSPpqO18OSvpGOyfbpmD+aPucW/ya2bfId69buvg48IulfGljnfWS/xtcCTwHfiojRyh5cdSHw6VRvOPAB4EBgkaT/AUwgGybi/ZJ2Bv5T0s9S/dHAwRHxm8qdSXon8EWysZNeIBth9eSIuFLS0cDfR8QWDw9K6w4CvgNMqDGAZHXdA4ADyO5ErnQwPQ/+dwXwUEScnGKZBRxKNiLC3SnGE4FJaR/vAf6abJC/P0r6d+AsYCmwT0QcXBG3DQBOItbWIuIlSbOAT5ENXZLH4u6xgiQ9CXQngSVAZbPS3IjYBCyT9BTw52RjEb234ixnD2AE8DpwX3UCSd4PdEXEc2mfs8keMpZn1ODrgO9FxH/WqfPXko4ka947PyIaGXzwSOAjABHxi3QGskeK75RUfpukF1L9Y8iS4eI0tMYuZEPH/xg4QNK1wG1sPqa2jXMSsW3Bv5IN6/6dirKNpOZaZd92O1Use61ielPF/Cbe+jdRPSZQkDXrXBgRt1cukNQJbOghvkIPSJI0kexs6Oxeqt4YERfUWb6U7It/Xs7Youq9uv7MiLh0iwXS+4DjyJrPTgf+pl7Qtm1wn4i1vfTLey5ZJ3W3FWweens8abDKBp0mabvUT3IA8ARwO9mgjTsCSPozZaOl1nMv8AFJQ1Kn+5nAnfVWSE1TXwDOSg9a2hpfAyZKOrxi+x+V9KfAXWTNUd2J8PnUbFZZfjwwOK16B3CqpL3Tsj0l7ZcubtguIn4IfI7sUc42APhMxLYV04DKX+PfBOZJuo/si6+ns4R6niD7su8APh4Rf5D0LbKzgwfTGc5zwMn1NhIRqyVdCiwi+yW/ICJqnRVUugTYFbhZb33S64WNfojUgX4G8OX05b+JLEncDHwe+I6kR8hGdO0eUv0K4AZJD5Idg2fSth6TdBlZv852wB/JzjxeTdvp/mG6xZmKbZs8iq+ZmRXm5iwzMyvMScTMzApzEjEzs8KcRMzMrDAnETMzK8xJxMzMCnMSMTOzwv4bOvDLFho0RoQAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "adult_pii['Zip'].value_counts().hist()\n",
    "plt.xlabel('邮政编码数量')\n",
    "plt.ylabel('出现次数');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### 我们可以重标识出多少个个体？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "我们可以在此数据集中重标识出多少个个体呢？我们可以使用我们的辅助信息来找到这个问题的答案！首先，让我们看看只有出生日期会发生什么。我们想知道辅助数据中的每个出生日期能帮助我们重标识出数据集中多少*可能的身份*。下面的直方图显示了每个可能身份的数量。在大约32000行数据中，我们可以唯一标识出近7000行数据，并将约10000行数据缩小至两个可能的身份。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true,
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAD4CAYAAAAO9oqkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAS+klEQVR4nO3df4xd9X3m8fezOCkGF+ws2RG1rTWrWmxTrE1hBLRI0TjugoEo5o8gUdHEiagsVTRLWksNWaliNz9WVApJm2gbyYrZmpbFpU5WWCEtsQijLFIgYJLGECfCTVxioDiVHSdDaFNnP/vH/VqZ2DNjz70zc+fE75c0mnu+53vOfe4w+Lnn3HPvpKqQJJ3d/s2wA0iShs8ykCRZBpIky0CShGUgSQKWDDtAvy666KJas2ZNX9u++uqrnH/++XMbaJ50KSt0K2+XskK38nYpK3Qr7yBZ9+7d+09V9cYpV1ZVJ7+uuOKK6tdjjz3W97YLrUtZq7qVt0tZq7qVt0tZq7qVd5CswNM1zb+pniaSJFkGkiTLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRId/jiKLlpz58Oz3mbruuO8u4/tTnbw7hsH3oekn18eGUiSLANJkmUgScIykCRhGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkjiDMkhyb5LDSZ6dNPaGJHuSPN++r2jjSfKJJAeSfD3J5ZO22dzmP59k86TxK5Lsa9t8Iknm+kFKkmZ2JkcGfw5sPGnsTuDRqloLPNqWAa4H1ravLcCnoFcewF3AVcCVwF0nCqTN2TJpu5PvS5I0z05bBlX1JeDIScObgB3t9g7gpknj91XPE8DyJBcD1wF7qupIVR0F9gAb27oLqurLVVXAfZP2JUlaIP2+ZjBSVS8DtO//ro2vBL47ad6hNjbT+KEpxiVJC2iu/9LZVOf7q4/xqXeebKF3SomRkRHGx8f7iAgTExN9bzuIreuOz3qbkaX9bXeyhXq8w/rZ9qNLWaFbebuUFbqVd76y9lsGryS5uKpebqd6DrfxQ8DqSfNWAS+18bGTxsfb+Kop5k+pqrYB2wBGR0drbGxsuqkzGh8fp99tB9HPn6/cuu449+wbvLMP3jo28D7OxLB+tv3oUlboVt4uZYVu5Z2vrP2eJtoNnLgiaDPw0KTxd7Wriq4GjrXTSI8A1yZZ0V44vhZ4pK37YZKr21VE75q0L0nSAjntU84kD9B7Vn9RkkP0rgq6G3gwyW3AC8DNbfrngRuAA8CPgPcAVNWRJB8CnmrzPlhVJ16U/l16VywtBf6mfUmSFtBpy6CqfmuaVRummFvA7dPs517g3inGnwYuO10OSdL88R3IkiTLQJJkGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkrAMJElYBpIkLANJEpaBJAnLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRKWgSQJy0CShGUgScIykCRhGUiSsAwkSVgGkiQGLIMkv5/kuSTPJnkgyblJLknyZJLnk/xVkte3ub/Qlg+09Wsm7ecDbfxbSa4b7CFJkmar7zJIshL4L8BoVV0GnAPcAvwx8PGqWgscBW5rm9wGHK2qXwY+3uaR5E1tu18FNgJ/luScfnNJkmZv0NNES4ClSZYA5wEvA28FdrX1O4Cb2u1NbZm2fkOStPGdVfUvVfUd4ABw5YC5JEmzkKrqf+PkDuAjwGvAF4A7gCfas3+SrAb+pqouS/IssLGqDrV1fw9cBfy3ts1ftvHtbZtdU9zfFmALwMjIyBU7d+7sK/fExATLli3ra9tB7Hvx2Ky3GVkKr7w2+H2vW3nh4Ds5A8P62fajS1mhW3m7lBW6lXeQrOvXr99bVaNTrVvSb6AkK+g9q78E+D7w18D1U0w90TaZZt1046cOVm0DtgGMjo7W2NjY7EI34+Pj9LvtIN5958Oz3mbruuPcs6/v/0w/te/VwfdxBrau+wn3PP7T+zp4940Lcr/9GNbvQb+6lLdLWaFbeecr6yCniX4T+E5Vfa+q/hX4LPAbwPJ22ghgFfBSu30IWA3Q1l8IHJk8PsU2kqQFMMhTzheAq5OcR+800QbgaeAx4B3ATmAz8FCbv7stf7mt/2JVVZLdwP9O8jHgl4C1wFcGyHVa+1481tezdEn6edV3GVTVk0l2Ac8Ax4Gv0juF8zCwM8mH29j2tsl24C+SHKB3RHBL289zSR4EvtH2c3tV/aTfXJKk2RvoZHRV3QXcddLwt5niaqCq+mfg5mn28xF6L0RLkobAdyBLkiwDSZJlIEnCMpAkYRlIkrAMJElYBpIkLANJEpaBJAnLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRKWgSQJy0CShGUgScIykCRhGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkhiwDJIsT7IryTeT7E/y60nekGRPkufb9xVtbpJ8IsmBJF9Pcvmk/Wxu859PsnnQByVJmp1Bjwz+FPjbqvqPwH8C9gN3Ao9W1Vrg0bYMcD2wtn1tAT4FkOQNwF3AVcCVwF0nCkSStDD6LoMkFwBvAbYDVNWPq+r7wCZgR5u2A7ip3d4E3Fc9TwDLk1wMXAfsqaojVXUU2ANs7DeXJGn2UlX9bZi8GdgGfIPeUcFe4A7gxapaPmne0apakeRzwN1V9XgbfxR4PzAGnFtVH27jfwS8VlUfneI+t9A7qmBkZOSKnTt39pX98JFjvPJaX5suuJGldCYrnJp33coLhxfmNCYmJli2bNmwY5yxLuXtUlboVt5Bsq5fv35vVY1OtW7JAJmWAJcD762qJ5P8KT89JTSVTDFWM4yfOli1jV4BMTo6WmNjY7MKfMIn73+Ie/YN8tAXztZ1xzuTFU7Ne/DWseGFOY3x8XH6/R0ahi7l7VJW6Fbe+co6yGsGh4BDVfVkW95Frxxeaad/aN8PT5q/etL2q4CXZhiXJC2Qvsugqv4R+G6SS9vQBnqnjHYDJ64I2gw81G7vBt7Vriq6GjhWVS8DjwDXJlnRXji+to1JkhbIoOcf3gvcn+T1wLeB99ArmAeT3Aa8ANzc5n4euAE4APyozaWqjiT5EPBUm/fBqjoyYC5J0iwMVAZV9TVgqhcjNkwxt4Dbp9nPvcC9g2SRJPXPdyBLkiwDSZJlIEnCMpAkYRlIkrAMJElYBpIkLANJEpaBJAnLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRKWgSQJy0CShGUgScIykCRhGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkpiDMkhyTpKvJvlcW74kyZNJnk/yV0le38Z/oS0faOvXTNrHB9r4t5JcN2gmSdLszMWRwR3A/knLfwx8vKrWAkeB29r4bcDRqvpl4ONtHkneBNwC/CqwEfizJOfMQS5J0hkaqAySrAJuBD7dlgO8FdjVpuwAbmq3N7Vl2voNbf4mYGdV/UtVfQc4AFw5SC5J0uwsGXD7PwH+EPjFtvxvge9X1fG2fAhY2W6vBL4LUFXHkxxr81cCT0za5+RtfkaSLcAWgJGREcbHx/sKPbIUtq47fvqJi0CXssKpefv9b7QQJiYmFnW+k3Upb5eyQrfyzlfWvssgyduAw1W1N8nYieEpptZp1s20zc8OVm0DtgGMjo7W2NjYVNNO65P3P8Q9+wbtwYWxdd3xzmSFU/MevHVseGFOY3x8nH5/h4ahS3m7lBW6lXe+sg7yr8w1wNuT3ACcC1xA70hheZIl7ehgFfBSm38IWA0cSrIEuBA4Mmn8hMnbSH1bc+fDM67fuu447z7NnH4cvPvGOd+nNN/6fs2gqj5QVauqag29F4C/WFW3Ao8B72jTNgMPtdu72zJt/Rerqtr4Le1qo0uAtcBX+s0lSZq9+Tj/8H5gZ5IPA18Ftrfx7cBfJDlA74jgFoCqei7Jg8A3gOPA7VX1k3nIJUmaxpyUQVWNA+Pt9reZ4mqgqvpn4OZptv8I8JG5yCJJmj3fgSxJsgwkSZaBJAnLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRKWgSQJy0CShGUgScIykCRhGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkrAMJElYBpIkLANJEpaBJAnLQJKEZSBJYoAySLI6yWNJ9id5LskdbfwNSfYkeb59X9HGk+QTSQ4k+XqSyyfta3Ob/3ySzYM/LEnSbAxyZHAc2FpVvwJcDdye5E3AncCjVbUWeLQtA1wPrG1fW4BPQa88gLuAq4ArgbtOFIgkaWH0XQZV9XJVPdNu/xDYD6wENgE72rQdwE3t9ibgvup5Alie5GLgOmBPVR2pqqPAHmBjv7kkSbOXqhp8J8ka4EvAZcALVbV80rqjVbUiyeeAu6vq8Tb+KPB+YAw4t6o+3Mb/CHitqj46xf1soXdUwcjIyBU7d+7sK+/hI8d45bW+Nl1wI0vpTFY4Ne+6lRcOLcu+F4/NuH6+frbz9ZgnJiZYtmzZvOx7rnUpK3Qr7yBZ169fv7eqRqdat2SgVECSZcBngPdV1Q+STDt1irGaYfzUwaptwDaA0dHRGhsbm3VegE/e/xD37Bv4oS+IreuOdyYrnJr34K1jQ8vy7jsfnnH9fP1s5+sxj4+P0+/v/ELrUlboVt75yjrQ1URJXkevCO6vqs+24Vfa6R/a98Nt/BCwetLmq4CXZhiXJC2QQa4mCrAd2F9VH5u0ajdw4oqgzcBDk8bf1a4quho4VlUvA48A1yZZ0V44vraNSZIWyCDHyNcA7wT2JflaG/uvwN3Ag0luA14Abm7rPg/cABwAfgS8B6CqjiT5EPBUm/fBqjoyQC5J0iz1XQbtheDpXiDYMMX8Am6fZl/3Avf2m0WSNBjfgSxJsgwkSZaBJAnLQJKEZSBJwjKQJGEZSJKYg88mkvSz1pzmM5H6tXXd8Rk/b+ng3TfOy/3q7OCRgSTJMpAkWQaSJCwDSRKWgSQJy0CShGUgScIykCRhGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAk4R+3kTQHTv6DPqf7QzxzxT/oM3c8MpAkWQaSJMtAkoRlIEliEZVBko1JvpXkQJI7h51Hks4mi+JqoiTnAP8T+M/AIeCpJLur6hvDTSZpMTv5KqZ+9XP108/blUyL5cjgSuBAVX27qn4M7AQ2DTmTJJ01UlXDzkCSdwAbq+p32vI7gauq6vdOmrcF2NIWLwW+1eddXgT8U5/bLrQuZYVu5e1SVuhW3i5lhW7lHSTrv6+qN061YlGcJgIyxdgpLVVV24BtA99Z8nRVjQ66n4XQpazQrbxdygrdytulrNCtvPOVdbGcJjoErJ60vAp4aUhZJOmss1jK4ClgbZJLkrweuAXYPeRMknTWWBSniarqeJLfAx4BzgHurarn5vEuBz7VtIC6lBW6lbdLWaFbebuUFbqVd16yLooXkCVJw7VYThNJkobIMpAknV1lkOTeJIeTPDvsLKeTZHWSx5LsT/JckjuGnWk6Sc5N8pUkf9ey/vdhZzoTSc5J8tUknxt2lpkkOZhkX5KvJXl62HlOJ8nyJLuSfLP9/v76sDNNJcml7Wd64usHSd437FwzSfL77f+xZ5M8kOTcOdv32fSaQZK3ABPAfVV12bDzzCTJxcDFVfVMkl8E9gI3LcaP6EgS4PyqmkjyOuBx4I6qemLI0WaU5A+AUeCCqnrbsPNMJ8lBYLSqOvGmqCQ7gP9bVZ9uVweeV1XfH3aumbSPxHmR3ptd/2HYeaaSZCW9/7feVFWvJXkQ+HxV/flc7P+sOjKoqi8BR4ad40xU1ctV9Uy7/UNgP7ByuKmmVj0TbfF17WtRP8tIsgq4Efj0sLP8PElyAfAWYDtAVf14sRdBswH4+8VaBJMsAZYmWQKcxxy+H+usKoOuSrIG+DXgyeEmmV475fI14DCwp6oWbdbmT4A/BP7fsIOcgQK+kGRv+0iWxew/AN8D/lc7BffpJOcPO9QZuAV4YNghZlJVLwIfBV4AXgaOVdUX5mr/lsEil2QZ8BngfVX1g2HnmU5V/aSq3kzv3eNXJlm0p+GSvA04XFV7h53lDF1TVZcD1wO3t9Odi9US4HLgU1X1a8CrwKL+SPp2KuvtwF8PO8tMkqyg9wGelwC/BJyf5Lfnav+WwSLWzr9/Bri/qj477Dxnop0SGAc2DjnKTK4B3t7Oxe8E3prkL4cbaXpV9VL7fhj4P/Q+5XexOgQcmnRkuIteOSxm1wPPVNUrww5yGr8JfKeqvldV/wp8FviNudq5ZbBItRdltwP7q+pjw84zkyRvTLK83V5K75f2m8NNNb2q+kBVraqqNfROD3yxqubsGdZcSnJ+u4CAdrrlWmDRXg1XVf8IfDfJpW1oA7DoLno4yW+xyE8RNS8AVyc5r/37sIHea4lz4qwqgyQPAF8GLk1yKMltw840g2uAd9J71nri0rcbhh1qGhcDjyX5Or3PmdpTVYv6cs0OGQEeT/J3wFeAh6vqb4ec6XTeC9zffh/eDPyPIeeZVpLz6P1RrUV/5N2OtnYBzwD76P37PWcfTXFWXVoqSZraWXVkIEmammUgSbIMJEmWgSQJy0CShGUgScIykCQB/x8YP8DFUdIyCAAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "attack = pd.merge(adult_pii, adult_data, left_on=['DOB'], right_on=['DOB'])\n",
    "attack['Name'].value_counts().hist();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "因此，*仅*通过出生日期来重标识大多数个体是不太可行的。如果我们收集更多的信息，进一步缩小范围呢？如果我们同时使用出生日期和邮政编码作为辅助数据，则重标识效果会变得更好。实际上，我们基本能够对数据集中的全部数据成功实施重标识攻击。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true,
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD8CAYAAACcjGjIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAFG5JREFUeJzt3W+MXXed3/H3Z21CU9jdBNyOIicrR8Xt1mxKADekAqmzoCZOKtVBYlHSiLhstF6JpAIpDwg8aLaESPAgUCWFVN6NlaRKMRF/apea9VpppnS1TUgWsjFJSjMNYWM3EDUOAYMKMnz7YH6mF//Gnus7M/faM++XdDXnfs/vnPP7TqL7uefcc8epKiRJGvRrk56AJOn0YzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySps3bSExjVunXrasOGDSNt++Mf/5jXvOY1Szuh05w9rw6rrefV1i8srud169axb9++fVW1ZaGxZ2w4bNiwgccee2ykbWdmZpienl7aCZ3m7Hl1WG09r7Z+YfE9J1k3zDgvK0mSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOmfsN6QX48ChV/gXN//nsR/3uU/807EfU5JG4ZmDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKmzYDgk+RtJvp7kr5I8meRft/qFSR5JMpvk80nOavVXt+ezbf2GgX19pNW/neTygfqWVptNcvPStylJOhXDnDn8FHhnVb0JuBjYkuRS4JPAp6vqDcDLwPVt/PXAy63+6TaOJJuAq4E3AluAzyZZk2QN8BngCmATcE0bK0makAXDoeYcaU9f1R4FvBP4QqvfC1zVlre257T170qSVt9VVT+tqu8As8Al7TFbVc9W1c+AXW2sJGlChvrMob3Dfxx4EdgP/C/gB1V1tA05CKxvy+uB5wHa+leA1w/Wj9vmRHVJ0oQM9e85VNXPgYuTnAN8GfjtZZ3VCSTZDmwHmJqaYmZmZqT9TJ0NN110dOGBS2zU+S6FI0eOTPT4k2DPK99q6xfG1/Mp/WM/VfWDJA8B/wg4J8nadnZwPnCoDTsEXAAcTLIW+E3gpYH6MYPbnKh+/PF3ADsANm/eXNPT06cy/V+68/7d3H5g/P/O0XPXTo/9mMfMzMww6u/rTGXPK99q6xfG1/Mwdyv9rXbGQJKzgX8CPA08BLynDdsG7G7Le9pz2vr/UlXV6le3u5kuBDYCXwceBTa2u5/OYu5D6z1L0ZwkaTTDvH0+D7i33VX0a8ADVfWVJE8Bu5J8HPgmcHcbfzfw75PMAoeZe7Gnqp5M8gDwFHAUuKFdriLJjcA+YA2ws6qeXLIOJUmnbMFwqKongDfPU3+WuTuNjq//X+D3TrCv24Db5qnvBfYOMV9J0hj4DWlJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUsdwkCR1DAdJUmfBcEhyQZKHkjyV5MkkH2z1P0pyKMnj7XHlwDYfSTKb5NtJLh+ob2m12SQ3D9QvTPJIq38+yVlL3agkaXjDnDkcBW6qqk3ApcANSTa1dZ+uqovbYy9AW3c18EZgC/DZJGuSrAE+A1wBbAKuGdjPJ9u+3gC8DFy/RP1JkkawYDhU1QtV9Y22/CPgaWD9STbZCuyqqp9W1XeAWeCS9pitqmer6mfALmBrkgDvBL7Qtr8XuGrUhiRJi3dKnzkk2QC8GXiklW5M8kSSnUnObbX1wPMDmx1stRPVXw/8oKqOHleXJE3I2mEHJnkt8EXgQ1X1wyR3AbcC1X7eDvz+sszy/89hO7AdYGpqipmZmZH2M3U23HTR0YUHLrFR57sUjhw5MtHjT4I9r3yrrV8YX89DhUOSVzEXDPdX1ZcAqur7A+v/GPhKe3oIuGBg8/NbjRPUXwLOSbK2nT0Mjv8VVbUD2AGwefPmmp6eHmb6nTvv383tB4bOxSXz3LXTYz/mMTMzM4z6+zpT2fPKt9r6hfH1PMzdSgHuBp6uqk8N1M8bGPZu4FtteQ9wdZJXJ7kQ2Ah8HXgU2NjuTDqLuQ+t91RVAQ8B72nbbwN2L64tSdJiDPP2+e3A+4ADSR5vtY8yd7fRxcxdVnoO+EOAqnoyyQPAU8zd6XRDVf0cIMmNwD5gDbCzqp5s+/swsCvJx4FvMhdGkqQJWTAcqurPgcyzau9JtrkNuG2e+t75tquqZ5m7m0mSdBrwG9KSpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpM6C4ZDkgiQPJXkqyZNJPtjqr0uyP8kz7ee5rZ4kdySZTfJEkrcM7GtbG/9Mkm0D9bcmOdC2uSNJlqNZSdJwhjlzOArcVFWbgEuBG5JsAm4GHqyqjcCD7TnAFcDG9tgO3AVzYQLcArwNuAS45VigtDF/MLDdlsW3Jkka1YLhUFUvVNU32vKPgKeB9cBW4N427F7gqra8Fbiv5jwMnJPkPOByYH9VHa6ql4H9wJa27jeq6uGqKuC+gX1JkibglD5zSLIBeDPwCDBVVS+0Vd8DptryeuD5gc0OttrJ6gfnqUuSJmTtsAOTvBb4IvChqvrh4McCVVVJahnmd/wctjN3qYqpqSlmZmZG2s/U2XDTRUeXcGbDGXW+S+HIkSMTPf4k2PPKt9r6hfH1PFQ4JHkVc8Fwf1V9qZW/n+S8qnqhXRp6sdUPARcMbH5+qx0Cpo+rz7T6+fOM71TVDmAHwObNm2t6enq+YQu68/7d3H5g6FxcMs9dOz32Yx4zMzPDqL+vM5U9r3yrrV8YX8/D3K0U4G7g6ar61MCqPcCxO462AbsH6te1u5YuBV5pl5/2AZclObd9EH0ZsK+t+2GSS9uxrhvYlyRpAoZ5+/x24H3AgSSPt9pHgU8ADyS5Hvgu8N62bi9wJTAL/AR4P0BVHU5yK/BoG/exqjrclj8A3AOcDXy1PSRJE7JgOFTVnwMn+t7Bu+YZX8ANJ9jXTmDnPPXHgN9ZaC6SpPHwG9KSpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpI7hIEnqGA6SpM6C4ZBkZ5IXk3xroPZHSQ4lebw9rhxY95Eks0m+neTygfqWVptNcvNA/cIkj7T655OctZQNSpJO3TBnDvcAW+apf7qqLm6PvQBJNgFXA29s23w2yZoka4DPAFcAm4Br2liAT7Z9vQF4Gbh+MQ1JkhZvwXCoqq8Bh4fc31ZgV1X9tKq+A8wCl7THbFU9W1U/A3YBW5MEeCfwhbb9vcBVp9iDJGmJrV3EtjcmuQ54DLipql4G1gMPD4w52GoAzx9XfxvweuAHVXV0nvGdJNuB7QBTU1PMzMyMNPGps+Gmi44uPHCJjTrfpXDkyJGJHn8S7HnlW239wvh6HjUc7gJuBar9vB34/aWa1IlU1Q5gB8DmzZtrenp6pP3cef9ubj+wmFwczXPXTo/9mMfMzMww6u/rTGXPK99q6xfG1/NIr5BV9f1jy0n+GPhKe3oIuGBg6PmtxgnqLwHnJFnbzh4Gx0uSJmSkW1mTnDfw9N3AsTuZ9gBXJ3l1kguBjcDXgUeBje3OpLOY+9B6T1UV8BDwnrb9NmD3KHOSJC2dBc8cknwOmAbWJTkI3AJMJ7mYuctKzwF/CFBVTyZ5AHgKOArcUFU/b/u5EdgHrAF2VtWT7RAfBnYl+TjwTeDuJetOkjSSBcOhqq6Zp3zCF/Cqug24bZ76XmDvPPVnmbubSZJ0mvAb0pKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkzoLhkGRnkheTfGug9rok+5M8036e2+pJckeS2SRPJHnLwDbb2vhnkmwbqL81yYG2zR1JstRNSpJOzTBnDvcAW46r3Qw8WFUbgQfbc4ArgI3tsR24C+bCBLgFeBtwCXDLsUBpY/5gYLvjjyVJGrMFw6GqvgYcPq68Fbi3Ld8LXDVQv6/mPAyck+Q84HJgf1UdrqqXgf3AlrbuN6rq4aoq4L6BfUmSJmTtiNtNVdULbfl7wFRbXg88PzDuYKudrH5wnvq8kmxn7oyEqakpZmZmRpv82XDTRUdH2nYxRp3vUjhy5MhEjz8J9rzyrbZ+YXw9jxoOv1RVlaSWYjJDHGsHsANg8+bNNT09PdJ+7rx/N7cfWHTrp+y5a6fHfsxjZmZmGPX3daay55VvtfUL4+t51LuVvt8uCdF+vtjqh4ALBsad32onq58/T12SNEGjhsMe4NgdR9uA3QP169pdS5cCr7TLT/uAy5Kc2z6IvgzY19b9MMml7S6l6wb2JUmakAWvrST5HDANrEtykLm7jj4BPJDkeuC7wHvb8L3AlcAs8BPg/QBVdTjJrcCjbdzHqurYh9wfYO6OqLOBr7aHJGmCFgyHqrrmBKveNc/YAm44wX52AjvnqT8G/M5C85AkjY/fkJYkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVJnUeGQ5LkkB5I8nuSxVntdkv1Jnmk/z231JLkjyWySJ5K8ZWA/29r4Z5JsW1xLkqTFWoozh9+tqouranN7fjPwYFVtBB5szwGuADa2x3bgLpgLE+AW4G3AJcAtxwJFkjQZy3FZaStwb1u+F7hqoH5fzXkYOCfJecDlwP6qOlxVLwP7gS3LMC9J0pAWGw4F/FmSv0yyvdWmquqFtvw9YKotrweeH9j2YKudqC5JmpC1i9z+HVV1KMnfBvYn+R+DK6uqktQij/FLLYC2A0xNTTEzMzPSfqbOhpsuOrpU0xraqPNdCkeOHJno8SfBnle+1dYvjK/nRYVDVR1qP19M8mXmPjP4fpLzquqFdtnoxTb8EHDBwObnt9ohYPq4+swJjrcD2AGwefPmmp6enm/Ygu68fze3H1hsLp66566dHvsxj5mZmWHU39eZyp5XvtXWL4yv55EvKyV5TZJfP7YMXAZ8C9gDHLvjaBuwuy3vAa5rdy1dCrzSLj/tAy5Lcm77IPqyVpMkTchi3j5PAV9Ocmw//6Gq/jTJo8ADSa4Hvgu8t43fC1wJzAI/Ad4PUFWHk9wKPNrGfayqDi9iXpKkRRo5HKrqWeBN89RfAt41T72AG06wr53AzlHnIklaWn5DWpLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUOW3CIcmWJN9OMpvk5knPR5JWs9MiHJKsAT4DXAFsAq5Jsmmys5Kk1eu0CAfgEmC2qp6tqp8Bu4CtE56TJK1ap0s4rAeeH3h+sNUkSROwdtITOBVJtgPb29MjSb494q7WAf9naWY1vHxy3Ef8FRPpecLseeVbbf3C4noeervTJRwOARcMPD+/1X5FVe0Adiz2YEkeq6rNi93PmcSeV4fV1vNq6xfG1/PpclnpUWBjkguTnAVcDeyZ8JwkadU6Lc4cqupokhuBfcAaYGdVPTnhaUnSqnVahANAVe0F9o7pcIu+NHUGsufVYbX1vNr6hTH1nKoax3EkSWeQ0+UzB0nSaWTFhkOSnUleTPKtE6xPkjvan+t4Islbxj3HpTZEz9e2Xg8k+Yskbxr3HJfaQj0PjPuHSY4mec+45rZchuk5yXSSx5M8meS/jnN+y2GI/7d/M8l/SvJXref3j3uOSynJBUkeSvJU6+eD84xZ1tewFRsOwD3AlpOsvwLY2B7bgbvGMKfldg8n7/k7wD+uqouAW1kZ12vv4eQ9H/vzLJ8E/mwcExqDezhJz0nOAT4L/LOqeiPwe2Oa13K6h5P/d74BeKqq3gRMA7e3Ox/PVEeBm6pqE3ApcMM8f1JoWV/DVmw4VNXXgMMnGbIVuK/mPAyck+S88cxueSzUc1X9RVW93J4+zNz3Sc5oQ/x3BviXwBeBF5d/RstviJ7/OfClqvrrNv6M73uIngv49SQBXtvGHh3H3JZDVb1QVd9oyz8Cnqb/qxHL+hq2YsNhCKv9T3ZcD3x10pNYbknWA+9mZZwZDuvvAucmmUnyl0mum/SExuDfAn8f+N/AAeCDVfWLyU5paSTZALwZeOS4Vcv6Gnba3Mqq8Unyu8yFwzsmPZcx+DfAh6vqF3NvKleFtcBbgXcBZwP/PcnDVfU/JzutZXU58DjwTuDvAPuT/Leq+uFkp7U4SV7L3Fnvh8bdy2oOh6H+ZMdKk+QfAH8CXFFVL016PmOwGdjVgmEdcGWSo1X1Hyc7rWV1EHipqn4M/DjJ14A3ASs5HN4PfKLm7s2fTfId4LeBr092WqNL8irmguH+qvrSPEOW9TVsNV9W2gNc1z7xvxR4papemPSkllOS3wK+BLxvhb+L/KWqurCqNlTVBuALwAdWeDAA7AbekWRtkr8JvI25a9Yr2V8zd6ZEking7wHPTnRGi9A+O7kbeLqqPnWCYcv6GrZizxySfI65uxbWJTkI3AK8CqCq/h1z38a+EpgFfsLcO48z2hA9/yvg9cBn2zvpo2f6Hy0boucVZ6Geq+rpJH8KPAH8AviTqjrprb6nuyH+O98K3JPkABDmLiWeyX+t9e3A+4ADSR5vtY8CvwXjeQ3zG9KSpM5qvqwkSToBw0GS1DEcJEkdw0GS1DEcJEkdw0GS1DEcJEkdw0GS1Pl/3A/Nm4ARu9YAAAAASUVORK5CYII=",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x10ad5b668>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "attack = pd.merge(adult_pii, adult_data, left_on=['DOB', 'Zip'], right_on=['DOB', 'Zip'])\n",
    "attack['Name'].value_counts().hist();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "当我们同时使用两部分信息实施重标识攻击时，我们可以重标识出**所有的个体**。这是一个非常令人惊讶的实验结果，因为我们通常认为很多人的出生日期都相同，而很多人居住地所属的邮政编码也会相同。事实证明，*组合*使用这些信息会得到**非常好**的筛选效果。拉坦娅·斯威尼（Latanya Sweeney）的研究结果表明{cite}`identifiability`，组合使用出生日期、性别、邮政编码，可以唯一重标识出87%的美国公民。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "我们来验证一下是否真的能重标识出*所有的个体*。我们打印出每个身份可能关联到的数据记录数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Barnabe Haime       2\n",
       "Antonin Chittem     2\n",
       "Penelope Fauning    1\n",
       "Sylvia Kenan        1\n",
       "Sadella Gutowski    1\n",
       "Name: Name, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "attack['Name'].value_counts().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "看来有两个个体抵御了重标识攻击！换句话说，在这个数据集中，只有**两个个体**同时拥有相同的邮政编码和出生日期。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 聚合"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "另一种防止隐私信息泄露的方法是只发布*聚合*（Aggregate）数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "38.58164675532078"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult['Age'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### 小群组问题"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "在很多情况下，我们需要将数据分组，并分别给出各个分组的聚合统计结果。举例来说，我们可能想知道取得不同学位个体的平均年龄。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Education-Num</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>42.764706</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>46.142857</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>42.885886</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Age\n",
       "Education-Num           \n",
       "1              42.764706\n",
       "2              46.142857\n",
       "3              42.885886"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult[['Education-Num', 'Age']].groupby('Education-Num').mean().head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "一般认为，对数据进行聚合处理可以提升数据的隐私保护效果，因为很难识别出特定个体对聚合统计结果所带来的影响。但如果某个分组*只包含一个个体*呢？在这种情况下，举个统计结果将*准确*泄露此个体的年龄，无法提供任何隐私保护！在我们的数据集中，大多数个体的邮政编码的唯一的。因此，如果我们计算不同邮政编码所属个体的平均年龄，则大多数\"平均值\"将直接泄露单一个体的年龄。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Zip</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>55.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>59.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>42.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Age\n",
       "Zip      \n",
       "4    55.0\n",
       "12   24.0\n",
       "16   59.0\n",
       "17   42.0\n",
       "18   24.0"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult[['Zip', 'Age']].groupby('Zip').mean().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "例如，美国人口普查局以[*区为粒度*](https://www.census.gov/newsroom/blogs/random-samplings/2011/07/what-are-census-blocks.html)发布聚合统计数据。有些人口普查区的人口众多，但有些人口普查区的人口为0！事实证明，聚合统计结果无法隐藏小分组的个体信息的情况相当普遍。\n",
    "\n",
    "分组要达到多大，聚合统计结果才能隐藏个体信息呢？很难回答这个问题，因为只有当知道数据本身和具体的攻击方法时，才能回答这个问题。因此，很难确信聚合统计结果真的能达到隐私保护的目的。然而，我们接下来将会看到，即使分组足够大，也可以实施相应的攻击，从聚合结果中获得个体信息。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### 差分攻击"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "当对相同的数据发布多个聚合统计结果时，隐私泄露问题会变得棘手。例如，考虑对数据集中某个大分组执行两次求和问询（第一个是对整个数据集进行问询，第二个是对除一条记录外的所有记录进行问询）："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1256257"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult['Age'].sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1256218"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult[adult['Name'] != 'Karrie Trusslove']['Age'].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "如果我们得到了这两个问询的回答，我们可以简单地对结果求减法，从而准确获得凯莉的年龄！即使在*非常大的分组下*发布聚合统计结果，我们仍然可以实施这一攻击。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "39"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult['Age'].sum() - adult[adult['Name'] != 'Karrie Trusslove']['Age'].sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "下述问题将在本书中反复出现。\n",
    "\n",
    "- 发布可用性很高的*数据*会提高隐私保护的难度。\n",
    "- 很难区分*恶意*和*非恶意*问询。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "- *关联攻击*指的是组合使用*辅助数据*和*去标识数据*来*重标识*个体。\n",
    "- 实施关联攻击最简单的方法是：将数据集中的两个数据表*关联*起来。\n",
    "- 即使实施简单的关联攻击，攻击效果也非常显著：\n",
    "  - 只需要一个辅助数据点，就足以把攻击范围缩小到几条记录\n",
    "  - 缩小后的记录可以进一步显示出哪些额外的辅助数据会有助于进一步实施攻击\n",
    "  - 对于一个特定的数据集，两个数据点一般足以重标识出绝大多数个体\n",
    "  - 三个数据点（性别、邮政编码、出生日期）可以唯一重标识出87%的美国公民"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
